Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add amdgpu target #134740

Merged
merged 1 commit into from
Feb 10, 2025
Merged

Add amdgpu target #134740

merged 1 commit into from
Feb 10, 2025

Conversation

Flakebi
Copy link
Contributor

@Flakebi Flakebi commented Dec 25, 2024

Add amdgpu target to rustc and enable the LLVM target.

Fix compiling core with the amdgpu:
The amdgpu backend makes heavy use of different address spaces. This
leads to situations, where a pointer in one addrspace needs to be casted
to a pointer in a different addrspace. bitcast is invalid for this
case, addrspacecast needs to be used.

Fix compilation failures that created bitcasts for such cases by
creating pointer casts (which creates an addrspacecast under the hood)
instead.

MCP: rust-lang/compiler-team#823
Tracking issue: #135024
Kinda related to the original amdgpu tracking issue #51575 (though that one has been closed for a while).

@rustbot
Copy link
Collaborator

rustbot commented Dec 25, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @GuillaumeGomez (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 25, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 25, 2024

These commits modify compiler targets.
(See the Target Tier Policy.)

This PR changes how LLVM is built. Consider updating src/bootstrap/download-ci-llvm-stamp.

Some changes occurred in src/doc/rustc/src/platform-support

cc @Noratrieb

This PR modifies config.example.toml.

If appropriate, please update CONFIG_CHANGE_HISTORY in src/bootstrap/src/utils/change_tracker.rs.

@jieyouxu
Copy link
Member

r? jieyouxu

@rustbot rustbot assigned jieyouxu and unassigned GuillaumeGomez Dec 25, 2024
@workingjubilee
Copy link
Member

cc @eddyb Hello, tagging you for domain expertise if you want to chime in.

@jieyouxu
Copy link
Member

jieyouxu commented Dec 25, 2024

Thanks for the PR, @Flakebi. I'm going to request that you open a MCP at https://github.com/rust-lang/compiler-team/issues/ to gauge team consensus for adding this target, primarily to give compiler team members some opportunity to ask clarifying questions and register possible concerns, since:

  • Adding this target requires modifying codegen_llvm in a non-trivial way (emitting at times
    addrspacecast instead of bitcast). In particular, as you stated, this target has a
    non-conventional addrspace usage model that I believe we don't quite observe in other existing
    targets:

    The amdgpu backend makes heavy use of different address spaces. This leads to situations,
    where a pointer in one addrspace needs to be casted to a pointer in a different addrspace.
    bitcast is invalid for this case, addrspacecast needs to be used.

  • This requires modifying the LLVM build to also include the AMDGPU backend.

  • This target seems to be intended for many different CPUs of varying hardware generation, but the
    present target definition defaults to gfx900.

Note that usually adding more "conventional" Tier 3 targets do not need to go through the MCP process, but this target looks not so conventional.

@jieyouxu jieyouxu added needs-mcp This change is large enough that it needs a major change proposal before starting work. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Dec 25, 2024
@jieyouxu
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 25, 2024
@Flakebi
Copy link
Contributor Author

Flakebi commented Dec 26, 2024

Thank you for the quick review!

I opened an MCP here: rust-lang/compiler-team#823

@traviscross
Copy link
Contributor

cc @ZuseZ4

@jieyouxu jieyouxu added S-waiting-on-MCP Status: PR has a compiler MCP and is waiting for the compiler MCP to complete. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. needs-mcp This change is large enough that it needs a major change proposal before starting work. labels Dec 26, 2024
@bors
Copy link
Contributor

bors commented Dec 27, 2024

☔ The latest upstream changes (presumably #134822) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 27, 2024
@jieyouxu jieyouxu removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 31, 2024
@Flakebi Flakebi mentioned this pull request Jan 2, 2025
16 tasks
@rustbot rustbot added the has-merge-commits PR has merge commits, merge with caution. label Jan 2, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2025
Add amdgpu target

Add amdgpu target to rustc and enable the LLVM target.

Fix compiling `core` with the amdgpu:
The amdgpu backend makes heavy use of different address spaces. This
leads to situations, where a pointer in one addrspace needs to be casted
to a pointer in a different addrspace. `bitcast` is invalid for this
case, `addrspacecast` needs to be used.

Fix compilation failures that created bitcasts for such cases by
creating pointer casts (which creates an `addrspacecast` under the hood)
instead.

MCP: rust-lang/compiler-team#823
Tracking issue: rust-lang#135024
Kinda related to the original amdgpu tracking issue rust-lang#51575 (though that one has been closed for a while).

try-job: dist-loongarch64-linux
try-job: dist-loongarch64-muls
try-job: dist-powerpc64-linux
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
dist-loongarch64-linux
dist-loongarch64-muls
dist-powerpc64-linux
##[endgroup]
INFO:root:Job type: TryRunType(custom_jobs=['dist-loongarch64-linux', 'dist-loongarch64-muls', 'dist-powerpc64-linux'])
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 314, in <module>
    calculate_job_matrix(data)
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 266, in calculate_job_matrix
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 266, in calculate_job_matrix
    jobs = calculate_jobs(run_type, job_data)
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 153, in calculate_jobs
    raise Exception(
    raise Exception(
Exception: Custom job(s) `['dist-loongarch64-muls']` not found in auto jobs
##[error]Process completed with exit code 1.

@matthiaskrgr
Copy link
Member

ah its called musl and not muls :)
@bors try

@bors
Copy link
Contributor

bors commented Feb 9, 2025

⌛ Trying commit 56795fb with merge ad8d586...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2025
Add amdgpu target

Add amdgpu target to rustc and enable the LLVM target.

Fix compiling `core` with the amdgpu:
The amdgpu backend makes heavy use of different address spaces. This
leads to situations, where a pointer in one addrspace needs to be casted
to a pointer in a different addrspace. `bitcast` is invalid for this
case, `addrspacecast` needs to be used.

Fix compilation failures that created bitcasts for such cases by
creating pointer casts (which creates an `addrspacecast` under the hood)
instead.

MCP: rust-lang/compiler-team#823
Tracking issue: rust-lang#135024
Kinda related to the original amdgpu tracking issue rust-lang#51575 (though that one has been closed for a while).

try-job: dist-loongarch64-linux
try-job: dist-loongarch64-musl
try-job: dist-powerpc64-linux
@bors
Copy link
Contributor

bors commented Feb 9, 2025

☀️ Try build successful - checks-actions
Build commit: ad8d586 (ad8d58687ff5e1b7935c4b25be4e251d15443948)

@saethlin
Copy link
Member

saethlin commented Feb 9, 2025

@bors r=workingjubilee

@bors
Copy link
Contributor

bors commented Feb 9, 2025

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Feb 9, 2025

📌 Commit 56795fb has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 9, 2025
@bors
Copy link
Contributor

bors commented Feb 10, 2025

⌛ Testing commit 56795fb with merge c03c38d...

@bors
Copy link
Contributor

bors commented Feb 10, 2025

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing c03c38d to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 10, 2025
@bors bors merged commit c03c38d into rust-lang:master Feb 10, 2025
7 checks passed
@rustbot rustbot added this to the 1.86.0 milestone Feb 10, 2025
@bors bors mentioned this pull request Feb 10, 2025
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c03c38d): comparison URL.

Overall result: ❌ regressions - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.7% [0.2%, 3.8%] 55
Regressions ❌
(secondary)
0.6% [0.2%, 1.0%] 47
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.7% [0.2%, 3.8%] 55

Max RSS (memory usage)

Results (primary 2.1%, secondary 3.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [0.4%, 5.8%] 93
Regressions ❌
(secondary)
3.4% [0.5%, 6.1%] 96
Improvements ✅
(primary)
-6.8% [-6.8%, -6.8%] 1
Improvements ✅
(secondary)
-1.5% [-1.5%, -1.5%] 1
All ❌✅ (primary) 2.1% [-6.8%, 5.8%] 94

Cycles

Results (primary 2.4%, secondary -2.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [0.6%, 3.6%] 3
Regressions ❌
(secondary)
2.4% [1.3%, 3.2%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.4% [-9.7%, -4.0%] 7
All ❌✅ (primary) 2.4% [0.6%, 3.6%] 3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 777.876s -> 781.62s (0.48%)
Artifact size: 329.17 MiB -> 348.30 MiB (5.81%)

@rustbot rustbot added the perf-regression Performance regression. label Feb 10, 2025
@Kobzol
Copy link
Contributor

Kobzol commented Feb 10, 2025

The syn regression is noise, the rest is possibly caused by the fact that the shipped LLVM is now 19 MiB larger due to the inclusion of the AMDGPU target.

@Flakebi Flakebi deleted the amdgpu-target branch February 10, 2025 10:04
@Mark-Simulacrum Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Feb 10, 2025
@Mark-Simulacrum
Copy link
Member

I'm not sure that size is really a driver for more instruction counts, but it does look like the mere possibility of cross-compiling to AMDGPU is enabling more passes/logic(?) even if presumably those don't do anything on x86. Maybe an optimization opportunity for LLVM/clang? It might be unavoidable with the architecture LLVM has today though.

Sampling a few cachegrind diffs:

helloworld:

--------------------------------------------------------------------------------
-- File:function summary
--------------------------------------------------------------------------------
  Ir______  file:function

<  575,695  ???:
   844,600    llvm::PassRegistry::enumerateWith(llvm::PassRegistrationListener*)
  -782,786    llvm::PassRegistry::enumerateWith(llvm::PassRegistrationListener*) [clone .warm]
   -69,691    llvm::FPPassManager::runOnFunction(llvm::Function&)
    60,129    llvm::MVT::getScalableVectorVT(llvm::MVT, unsigned int)
    54,986    ecache_evict
   -49,525    llvm::SelectionDAGISel::CodeGenAndEmitDAG()
   -41,251    std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > std::copy<llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlock*, 8u>, false, ll>
    40,624    llvm::StringMapImpl::RehashTable(unsigned int)
    39,256    llvm::StringMapImpl::LookupBucketFor(llvm::StringRef, unsigned int)
    38,978    edata_cache_get
    38,420    llvm::InstCombinerImpl::visitCallBase(llvm::CallBase&)
    37,509    llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>::invalidate(llvm::LazyCallGraph::SCC&, llvm::PreservedAnalyses const&)
    36,129    llvm::SelectionDAG::Legalize()
   -33,750    llvm::PassRegistry::registerPass(llvm::PassInfo const&, bool)
   -33,501    llvm::InstCombinerImpl::visitCallInst(llvm::CallInst&)
   -31,525    tcache_bin_flush_small
    31,343    llvm::PMTopLevelManager::findAnalysisPassInfo(void const*) const
    30,580    eset_remove

clap:

--------------------------------------------------------------------------------
-- File:function summary
--------------------------------------------------------------------------------
  Ir__________  file:function

<  209,305,246  ???:
  -230,519,105    llvm::computeKnownBitsFromContext(llvm::Value const*, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&)
  -132,162,101    computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&) [clone
   124,686,235    computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&)
   114,060,111    llvm::LiveIntervalCalc::calculate(llvm::LiveInterval&, bool)
  -113,662,179    llvm::LiveIntervals::computeVirtRegs()
   104,218,878    llvm::PointerMayBeCaptured(llvm::Value const*, llvm::CaptureTracker*, unsigned int)
   -97,035,202    llvm::SelectionDAGISel::CodeGenAndEmitDAG()
   -90,098,002    llvm::AAResults::getModRefInfo(llvm::Instruction const*, std::optional<llvm::MemoryLocation> const&, llvm::AAQueryInfo&)
    89,433,445    llvm::RAGreedy::calculateRegionSplitCostAroundReg(unsigned short, llvm::AllocationOrder&, llvm::BlockFrequency&, unsigned int&, unsigned int&)
   -83,724,622    llvm::RAGreedy::calculateRegionSplitCost(llvm::LiveInterval const&, llvm::AllocationOrder&, llvm::BlockFrequency&, unsigned int&, bool)
    82,820,615    computeKnownBits(llvm::Value const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&) [clone
   -72,563,818    llvm::ScheduleDAGSDNodes::BuildSchedGraph(llvm::AAResults*)
    71,100,555    llvm::SelectionDAG::Legalize()
   -70,774,808    std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > std::copy<llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlock*, 8u>, false, llvm::GraphTraits<llvm::BasicBlock*> >, std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > >(llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlo>
    68,499,576    llvm::ScheduleDAGSDNodes::AddSchedEdges()
   -67,376,856    (anonymous namespace)::TailRecursionEliminator::eliminate(llvm::Function&, llvm::TargetTransformInfo const*, llvm::AAResults*, llvm::OptimizationRemarkEmitter*, llvm::DomTreeUpdater&) [clone
    66,298,390    computePointerICmp(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&)
    62,749,680    std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > std::__copy_move_a2<false, llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlock*, 8u>, false, llvm::GraphTraits<llvm::BasicBlock*> >, std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > >(llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrS>
   -62,208,865    simplifyICmpInst(unsigned int, llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&, unsigned int) [clone
    59,398,553    llvm::SCCPInstVisitor::markUsersAsChanged(llvm::Value*)
   -58,896,104    llvm::RegAllocBase::allocatePhysRegs()
    57,899,944    llvm::RAGreedy::selectOrSplitImpl(llvm::LiveInterval const&, llvm::SmallVectorImpl<llvm::Register>&, llvm::SmallSet<llvm::Register, 16u, std::less<llvm::Register> >&, llvm::SmallVector<std::pair<llvm::LiveInterval const*, llvm::MCRegister>, 8u>&, unsigned int)

@lqd
Copy link
Member

lqd commented Feb 10, 2025

what the eff. we should probably look into this more, it’s super weird

@Kobzol
Copy link
Contributor

Kobzol commented Feb 11, 2025

I remember that sometimes binary/dynamic library size increases also increased icounts due to the dynamic linker doing more work. But based on CacheGrind, it looks like LLVM is actually doing more work, seems like it maybe iterates over more passes that were enabled by the amdgpu target?

@lqd
Copy link
Member

lqd commented Feb 11, 2025

The max-rss increases also look unexpected, and numerous enough to not be measurement noise. (Could memory allocation be in these ??? cg reports, sometimes it does this for me rather than finding jemalloc/malloc, probably some tests with local builds could be interesting with better debuginfo. That would also help with checking the cycles and wall time results, which seemingly aren’t super stable in these results.)

Does this need more time to bake maybe? @Mark-Simulacrum you’ve marked this as triaged because it’s less actionable on our side than in llvm, right?

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 11, 2025
[experiment] dont init anything except x86

What if do not init all llvm targets always? Maybe fix regression in rust-lang#134740

r? `@ghost`
`@rustbot` label +S-experimental

btw, here https://github.com/rust-lang/rust/blob/c182ce9cbc8c29ebc1b4559d027df545e6cdd287/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp#L81-L186 similar list for targets, but it missing amdgpu. Is amdgpu works without it?

kick perf run please
@workingjubilee
Copy link
Member

hate to go "ooh, LLVM troubles, let's tell Nikita!" but uhhhh "weird LLVM perf" really does need the Vibe Sense of that kind of expertise, sooo cc @nikic

@nikic
Copy link
Contributor

nikic commented Feb 11, 2025

Adding the amdgpu target shouldn't make any additional passes run -- additional cost from registering additional passes etc is plausible though.

max-rss increasing with increasing code size is pretty common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.